Hierarchical graph learning for protein–protein interaction

Protein-Protein Interactions (PPIs) are fundamental means of functions and signalings in biological systems. The massive growth in demand and cost associated with experimental PPI studies calls for computational tools for automated prediction and understanding of PPIs. Despite recent progress, in silico methods remain inadequate in modeling the natural PPI hierarchy. Here we present a double-viewed hierarchical graph learning model, HIGH-PPI, to predict PPIs and extrapolate the molecular details involved. In this model, we create a hierarchical graph, in which a node in the PPI network (top outside-of-protein view) is a protein graph (bottom inside-of-protein view). In the bottom view, a group of chemically relevant descriptors, instead of the protein sequences, are used to better capture the structure-function relationship of the protein. HIGH-PPI examines both outside-of-protein and inside-of-protein of the human interactome to establish a robust machine understanding of PPIs. This model demonstrates high accuracy and robustness in predicting PPIs. Moreover, HIGH-PPI can interpret the modes of action of PPIs by identifying important binding and catalytic sites precisely. Overall, “HIGH-PPI [https://github.com/zqgao22/HIGH-PPI]” is a domain-knowledge-driven and interpretable framework for PPI prediction studies.


Point-by-point responses to reviewer comments:
NCOMMS-22-42373 "Hierarchical Graph Learning for Protein-Protein Interaction" by Ziqi Gao, et al.
We are very grateful to the Editor and Reviewers for their critical comments on the original manuscript. We have now addressed in detail all your concerns, which, we think, has greatly improved the quality and readability of our paper.
Please find the point-by-point responses (in black) to reviewer comments (in blue). All edits have been colored in red in our revised manuscript version and the revised supplementary information.

Reviewer #1 (Remarks to the Author):
The paper titled "Hierarchical Graph Learning for Protein-Protein Interaction" proposed a new computational method for prediction protein-protein interactions. This method integrated the protein-protein interaction network and protein 3D structure information to achieve good performances. The authors proposed that these two kinds of information can be integrated together by hierarchical graph model. Graph neural network-based methods can be developed on the hierarchical graph. Particularly, the authors proposed two views of protein-protein interaction, the bottom inside-of-protein view and the top outside-of-protein view. The two views described proteinprotein interactions at two different scale, the residue-residue intra-molecular scale and the proteinprotein inter-molecular scale. This kind of information integration is new under the topic of proteinprotein interaction. The application of graph neural network, or more precisely GCN and GIN is good with sophisticated designs, achieving an end-to-end predictive model. The validation results of the method show that it has better prediction performance, good robustness and good generalization ability. I think this is an innovative contribution to the bioinformatics field with great potential. I have several minor comments, which I would like the authors to clarify or just verify in the revisions.
We want to thank you for your appreciation of our effort and for highly recognizing that HIGH-PPI may offer a lot of potential for the bioinformatics community. To address your concerns, we provide point-by-point responses as below.
(1) I believe that the authors knew that the quality of high-throughput protein-protein interaction data is always a problem in predicting protein-protein interaction. This problem has two sides. One is that the positives may contain a considerable number of false positives. The other is that the negatives are not reliable. Both sides affect machine learning models. The authors applied, "perturbations including random addition or removal to known interactions", in order to prove the robustness of their methods. This is good. However, robust models may bring in false discovery in practical applications. I think it would be better if the authors can perform some randomization or permutation tests to prove the statistical quality of their method. If the authors feel this is of too much work or not necessary, they may try to make some discussions.
Thank you for the question on the robustness evaluation in our paper and for suggesting a more systematic evaluation protocol. We agree with you that it is necessary and valuable to investigate the statistical quality of false positive (FP) and false negative (FN) performance, particularly for the protein-protein interaction (PPI) data that may not be reliable. To address this issue, we have made some discussions of our robustness evaluation experiment (Fig. 2b) and conducted an additional experiment to provide evidence.
On one hand, we respectfully clarify the intuition of the robustness evaluation (Fig. 2b) in our manuscript. We collectively refer to the potential FP and FN samples as unreliability. In essence, both the training set and the test set conceal unreliability in the involved dataset (e.g., SHS27k). The most ideal robustness evaluation requires us to keep the training set containing unreliability unchanged and remove the unreliable labels from the test set, but this is unfortunately an intractable problem. Thus, in order to conduct the robustness evaluation experiments, we made a compromise that we consider the original dataset to be reliable and artificially add perturbations to represent data unreliability. Importantly, perturbations are only added to the training set labels, while the test set labels remain unchanged (reliable). This simulates the fact that the PPI data we use in the training model is inherently unreliable, while we evaluate the trained model with its predictions and reliable test set labels.

Dataset
Data 1 Table R1. Statistics of FHI SRJNP and FGI SRJNP of 9 datasets created for training the models.
On the other hand, to address your concern, we designed the following experimental protocol, and compared the statistical performance of our approach HIGH-PPI with a solid baseline (i.e., GNN-PPI). In brief, we want to explore whether the unreliability of the data causes the unreliability of our model. First, we assume that the original SHS27k dataset has almost no FP or FN PPI samples. We name the original dataset as reliable data and dataset containing FP or FN PPI samples as unreliable data. Second, we create 9 unreliable datasets with various FP rates (/34 trimo ) and FN rates (/24 trimo ) shown in Table R1. Then, we report robustness evaluation metrics including FPR, FNR and false discovery rate (FDR  Table R1. The computation algorithm of PT in terms of a single metric is summarized in following pseudo-code. Computation algorithm for permutation test Input: e scores (e.g., /34 qrl ) of HIGH-PPI (C e ) & \×l ) and GNN-PPI (C d ) & \×l ).

Results and Discussions:
Experiment 1 (Table R2) mainly evaluates the unreliability of predictions of models that are trained with reliable data. We observe that HIGH-PPI achieves better performance (i.e., average results) of each metric, with different statistical qualities calculated with permutation tests. More precisely, our model shows insignificant superiority (A vinul = 0.03842) in the /34 qrl metric, and considerable superiority in the /24 qrl (A vinul = 0.00012) and /.4 qrl (A vinul = 0.00015) metrics, which illustrates the performance superiority of our model and leads to the following more in-depth analysis.
The AI-assisted PPI prediction's primary goal is to effectively discover or choose possible protein partners for further validation. Determining "more real PPIs are selected by AI models (i.e., TP or Recall)" rather than "more PPIs selected by AI models are real" is, in our opinion, of greater significance. In this regard, we are glad to find that our model significantly outperforms the competition, particularly when it comes to the TP-related metrics (i.e., /24 qrl , /.4 qrl ). We sincerely hope that future work can optimize all robustness metrics in a comprehensive manner.
Experiment 2 (Table R3) mainly evaluates the model robustness against unreliable training data. We can see that even when our proposed model encounters unreliable data, it still performs better (i.e., average results) of all three metrics. Surprisingly, while retaining the original significance in /24 qrl and /.4 qrl , our model substantially improves the superiority significance in the /34 qrl metrics (A vinul : 0.03842 ( 0.00004). This displays good statistical quality of our model in terms of the robustness evaluation. Moreover, we note that /.4 qrl is most impacted by unreliable data (bigger $), which is mostly because the number of false positives (FP) significantly increases (up to 6 times) from Data 1 to Data 9.
As a result, the model appears to be more sensitive to the /34 trimo and filters extensive protein pairs that are not genuinely true, even though the /34 trimo and /24 trimo in each unreliable dataset are the same. Simply put, the difficulty of the follow-up validation will directly correlate with the model's robustness, therefore unreliable data result in the expense of more time-consuming follow-up experimental validation. To address this issue, we instinctively suggest a straightforward potential option for further research. A possible explanation for the rise in FP is that the model's "low demand" for a positive sample permits certain controversial samples to be projected as true. In light of this, we recommend future research to use a voting strategy which uses the voting outcomes of various independent classifiers to identify true PPIs. Independence makes it unlikely for voting classifiers to commit the same errors, increasing the need of the model for correctly predicting samples-to be approved by the majority of voting classifiers.

Supplementary notes:
1. We have stored more detailed results (e.g., specific values for TP) in a csv file (named revision_1.csv) here so that you can view them quickly and conveniently.
2. We have revised the manuscript according to this comment. Specifically, we have added experimental analysis on robustness evaluation in the third paragraph of Section 2.2 (Line 174-185).
"Furthermore, we perform false discovery on our method, which investigates the effect of the training data unreliability (i.e., false negative (FN) and false positive (FP)) on our model and a solid baseline (GNN-PPI). Specifically, we consider the original dataset to be reliable and artificially add perturbations to represent data unreliability. Supplementary Table 1 shows the created 9 datasets with different FP rates (/34 trimo ) and FN rates (/24 trimo ). We respectively train the model on the reliable training set and created 9 unreliable ones and present the FP rates (/34 qrl ), FN rates (/24 qrl ) and false discovery rates (/.4 qrl ) metrics on the test sets (see Supplementary Table 2 and 3). Without unreliability, our model achieves best performance with insignificant superiority (*3 x '"* w %$ b] ) in the /34 qrl metric, and considerable superiority in the /24 qrl (***3 x %"& w %$ b_ ) and /.4 qrl (***3 x %") w %$ b_ ) metrics.
When introducing data unreliability, we are surprised to find that our model substantially improves the superiority significance in the /34 qrl metric (****3 x ("$ w %$ b` ) while retaining the original significance in /24 qrl and /.4 qrl . In addition to showing the excellent robustness of our model, we also provide more in-depth insights in Section 3.2." Table 2, 3) to the Supplementary Information file.

We have added results for robustness evaluation experiments in the table form (Supplementary
4. We have added the comments for future work on improving model robustness in Section 3.2 (Line 393-402).
"(4) In the future work, model robustness can be further improved. Although our model outperforms in the robustness evaluation (see Supplementary Table 3), we observe that /.4 qrl is most impacted by unreliable data, which is mostly because the number of FP significantly increases (up to 6 times) from Data 1 to Data 9. A possible explanation for the significant rise in FP is that the model's "low demand" for a positive sample permits certain controversial samples to be projected as true. To address this issue, we recommend the future work to consider a straightforward methodthe voting strategy which uses the voting outcomes of various independent classifiers to identify true PPIs. Independence makes it unlikely for voting classifiers to commit the same errors. A test pair can only be predicted as true if it is approved by most voting classifiers, which makes the model more demanding for the PPI presence." (2) In Fig3(d), the authors analyzed the importance of different properties in their model. I am not sure whether this proves their choices of the seven physicochemical properties. There are about 1/3 "blocks" in a very light color, indicating a near zero z-score. The authors may need to make more discussions on why and how they choose the seven physicochemical properties, but not others.
Thanks for your insightful comments. We strongly agree with you that more discussion about physicochemical properties selection is necessary in our work. We respectfully clarify that feature importance exploration in the manuscript (Fig. 3d) is intended to better explain our model and explore potential biomarkers of PPIs. However, we realize it to be ambiguous in the manuscript and we apologize for the ambiguity in the description of feature importance. In fact, the 7 features shown in Fig. 3d are chosen ones from 12 optional features (Table R4), and these 7 important features constitute the optimal set that enables the model to operate at its peak. 'A near zero z-score' represents a relative importance among those important features in Fig. 3d.
Then, to address your question about "how they choose the seven physicochemical properties", we outline the process to choose 7 crucial features out of 12 available options, and we provide explanations based on domain knowledge in physicochemical processes. All accessible properties (with links) that we can find at the amino acid level for preparation are listed in Table R4.

Feature selection experiments:
Here, we humbly offer a succinct explanation of feature selection in this work. AI models can become more accurate and run faster by selecting the best subset of input information. Extremely fast feature selection is made possible by well-known model-independent methods such as maximizing correlation coefficient and maximizing mutual information. The model-independent feature selection methods, however, can only estimate the performance of AI models up to a certain point and might not be appropriate for all of them. Despite the potential time commitment, we employ model training-based methodologies to choose the best subset of features for our proposed model. Below, we describe the model-dependent feature selection approach.

Experiment process:
To train and test our model, we remove a specific feature dimension from the dataset (note: not zero padding). We run 3 seeds for each feature dimension and determine the feature importance based on the average best-F1 score's negative value. All 12 optional features' importance values are obtained, and their z-scores are then computed.

Results:
In Table R5, we display the mean of the importance for each kind of feature. We also show the z-scores and the final sort results. Following the sorting result, we gradually increase the feature dimension from Topological Polar Surface Area (ranked 1st). The AUPR and F1 results peak once the feature of Octanol-Water Partition Coefficient (ranked 7th) is included. Thus, we ultimately settled on the seven physicochemical properties shown in the manuscript.

Explanations:
To address your question about "why they choose the seven physicochemical properties", we also justify some of selected features with domain knowledge in biology and chemistry mined from published papers.  [R2] Bteich, M. An overview of albumin and alpha-1-acid glycoprotein main characteristics: highlighting the roles of amino acids in binding kinetics and molecular interactions. Heliyon 5(11) (2019).

Supplementary notes:
We have revised the manuscript according to this comment.
1. We have clarified the different motivations for feature selection and feature importance computation in the fifth paragraph of Section 2.3 (Line 256-261).
"For node features in protein graphs, we first select seven important features from twelve residue-level feature options (see Supplementary Table 4) that are easily available. The feature selection process (see Supplementary File 1 for details) produces the optimal set consisting of seven features to ensure that our model peaks at both AUPR and best-F1 scores. Here, we list the selected seven residue-level physicochemical properties in Fig. 3d and discuss their importance for different types of PPIs to both better interpret our model and discover enlightening biomarkers for PPI interface." 2. We have added a description of the feature selection process (7 important features out of 12 optional ones) and the selection results in the Supplementary File 1.
(3) In the discussions, the authors mentioned that a future work may be considered at atomic level. I think there is a middle-scale view, which is at domain-scale. Maybe this can be discussed. Surely, if the authors feel this is unnecessary or not proper in this paper, they may choose to ignore this.
We would appreciate your constructive comments on the future work of hierarchical graph learning. We strongly agree with you that the protein domain may offer a useful middle-scale view on the PPI issue. Actually, from both structural and functional standpoints, protein domains fall somewhere in the middle of amino acids and proteins. Protein domains are compact, foldable, three-dimensional structures made of amino acid residues. The three-dimensional structure of the complete protein is made up of several protein domains acting as structural building blocks. When it comes to function, each domain oversees expressing a certain protein function. The specificity of protein activities is produced by the assembly of various domains, which also controls the existence or absence of PPIs and hot spots at the PPI interface. Therefore, both in terms of structures and functions, the protein domain represents a crucial middle scale for the PPI problem.
Our authors believe that it would be more appropriate to introduce the domain scale in a separate work. To the best of our knowledge, most domain annotations will be acquired through computational prediction tools like InterPro [R10] and SMART [R11], which might offer inaccurate domain data. Although it makes intuitive sense that the protein domain is an intermediary take up in the PPI problem, the authors had not previously given the idea much thought (at least not until the reviewer brought it up), mostly due to the dearth of trustworthy domain annotations. There must be a lot more work to be done on top of the HIGH-PPI to figure out how to create an end-to-end AI model for the residue-domain-protein hierarchy and guarantee the consistency of representation between adjacent views. In addition, the introduction of the middle scale view can be considered as the process of data upscaling, which could result in more valuable information or on the contrary, the loss of accuracy due to data redundancy. Based on the findings of the HIGH-PPI evaluation alone, our downscaled double-viewed model, which temporarily ignores the domain-scale, performs satisfactorily and we anticipate that the addition of new data will maintain this pervasiveness. Nevertheless, we respectfully recommend further research on multi-view hierarchical learning for PPI prediction.
About future work: # (Information benefits) It has been demonstrated that the proposed doubleviewed hierarchical model, which is based on the high degree of information complementarity between the data of two views, benefits from both views. For instance, the PPI network (top view) will suggest active protein information (i.e., with a high network degree from top view) to the bottom view. Similarly, the bottom view helps PPI network to complement residue-level fragmentation knowledge, i.e., the knowledge of the functions performed by residue fragments for a particular PPI instance. The two current perspectives will benefit from the accuracy of the domain annotations because it provides data on functional residue segments that are crucial to PPIs. However, when the domain scale is employed as a separate view, data unreliability could spread to other views and may impair the hierarchical model as a whole. Therefore, for the information gain in the hierarchical model, an early reliability assessment of the available domain annotations is required. $ (Supervised information at the residue level) HIGH-PPI now enables interpretable identification for PPI-related functional sites without supervising domain data. Therefore, directly supervising the selection of significant functional sites is a straightforward way for the doubleviewed hierarchical model to gain from domain information. Precisely, a well-designed regularization is required to guarantee that all functional sites, discovered by HIGH-PPI, belong in the prepared domain database. The domain regularization and the PPI prediction loss form a flexible trade-off of learning objectives, which can appropriately tolerate the domain annotation unreliability.
% (Computational efficiency) It is worth mentioning that the explosion of data required for multiview hierarchical modeling necessitates the use of lightweight backbones for quick training and prediction. Our suggested hierarchical graph learning backbone, a memory and compute efficient backbone for simultaneously learning the structure-function relationship, may therefore provide insight for future studies. & (Protein structure modeling) The research on protein structure modeling, which is critical for understandable mining of essential interaction areas, needs to be continued in the upcoming work. "(2) Protein domain information may be beneficial for hierarchical models. We clarify the core ideas here and provide detailed description in Supplementary File 2. Domains are distinct functional or structural units in proteins and are responsible for PPIs and specific protein functions. Both in terms of structures and functions, the protein domain can represent a crucial middle scale for the PPI hierarchy. However, to our knowledge, true (native) domain annotations are not easily available and predicted ones are usually retrieved from computational tools, which inevitably leads to the data unreliability. If we employ the domain scale as a separate view, data unreliability may spread to other views and impair the entire hierarchical model. On this basis, we prefer to recommend domain annotations as supervised information at the residue level. Precisely, a well-designed regularization is required to guarantee that all functional sites, discovered by HIGH-PPI, belong in the prepared domain database. The domain regularization and the PPI prediction loss form a flexible trade-off of learning objectives, which can appropriately tolerate the domain annotation unreliability." 2. We have also added detailed discussion of the domain scale in the Supplementary File 2.
(4) Although the authors have shared the data and code, I still have no time to validate/reproduce the results myself. However, I managed to browse the data and have a suggestion. The authors should share the data in pure text format, like csv/tsv/txt etc., but not the numpy/npy format, which is not convenient for quick browsing.
Thanks for your kind comment. Clear data visualization for convenient browsing is also one of our goals. To address this issue, we have converted some of the files into the csv or txt format which the audience may wish to view quickly. The audience can access the data file here for quick browsing.
We have presented some of the screenshots of those files below (Table 1R-3R). To facilitate viewing the graph-structured data of proteins, we generated adjacency matrix heatmaps for some proteins in the SHS27k dataset (see examples in Table 4R). (5) There are several minor confusing writings in the text and math, I think the authors should verify them. I also suggest the authors to proof read the manuscript again, as some kind of "hurry" can be felt when reading its current version.
Thanks for pointing this out. We agree and have made the following edits.
On Page 6, legends of the fig1 "SEPRINA3" should be "SERPINA3"? Please notice the spelling We have replaced the typo "SEPRINA3" with "SERPINA3" in the fourth Line of the Fig. 1 legend.
On Page 18, Line 332, I think it is "3.2", not "3.1" We have changed the section number in Line 360 (originally Line 332) to "3.2".
On Page 20, Line 363, "gb \in gt", this does not make sense. I think it should be "gb \in Vt", according to the wors on Line 362 We have changed the notation "; j + ; t " to "; j + 5 t " in Line 411 (originally Line 362).
We have changed the wrong notation "0 \ + 0 \ " to "0 s + 0 \ " in Line 484 (originally Line 433) We apologize for the confusion caused by the typos above. As a precaution, we have performed a thorough double-check of the writing details in our manuscript.

Reviewer #2 (Remarks to the Author):
Comments to the Author: The manuscript 'Hierarchical Graph Learning for Protein-Protein Interaction' submitted by Gao et al deals with a novel hierarchical graph learning based method allowing the authors to predict interactions for given protein pairs and key residues for their interactions. The authors used multitype human protein-protein interactions (PPI) and showed that their method demonstrates high accuracy and robustness in predicting PPIs. At the core of their method is a synergistic predictive effect of two levels: the level of the structural representations of the protein itself (lower level) and the level of the PPI network (upper level). By mutually sharing the information learned in each of the lower and upper levels, it leads to better structural representations of proteins and learning relationships between proteins. Their method is novel in that it models the natural PPI relationship by hierarchically linking both levels. In addition, it is also interesting to graphically represent protein and network structures using Graph Neural Networks (GNN) for this modelling. Furthermore, the authors have rigorously evaluated the robustness of their method from various viewpoints, increasing the reliability of their method. The manuscript is well written and wellillustrated. I think that it is worthy of being published in Nature Communication. I only have some minor questions and comments: Thanks for your in-depth explanation of the core methodology of our work and for your recognition of our contribution. To address your constructive comments, we provide point-by-point responses below.
Minor questions and comments: -Page 7, in the section 2.1, line 134-136: The proposed method is trained and evaluated using a dataset consisting of 1,690 proteins and 7,624 PPIs. By simple calculation, one protein is involved in 4.5 PPIs (7,624 PPIs / 1,690 proteins = 4.511…). According to Park and Marcotte (Nat Methods. 2012 December;9(12): 1134-1136. doi:10.1038/nmeth.2259, PPI prediction methods tend to perform much better on test pairs that share the same proteins with the training dataset than those that do not. Although the generalization evaluation (OOD) has been done, it would be useful for readers to show the overall performance and robustness evaluation in C2 and C3 among 3 distinct classes proposed by Park and Marcotte.
Thank you for pointing out the interesting and enlightening way for revisiting the model performance. In response to your suggestions, we have designed experiments for the generalization evaluation (GE), which assesses the model's capacity to handle OOD data. Furthermore, we performed robustness evaluation (RE) by manually adding perturbations to the OOD dataset.

Experimental protocol:
There're following tips to clarified before the experiments: First, we created 3 data partitions based on the SHS27k dataset. Among the test sets split by 3 data partitions, each of them shares a range of 3 different test pairs including -\ , -] , and -^. Here, -\ stands for the percentage of PPIs of which both proteins were present in a training set (Class 1), -] stands for the percentage of PPIs of which one of (but not both) proteins was present in the training set (Class 2), -^ stands for the percentage of PPIs of which neither protein was present in the training set (Class 3).
Second, we didn't directly employ the random or OOD (BFS, DFS) partitioning methods presented in the manuscript to create the three partitions because of two limitations: 1) the -^ from random partitioning is almost 0%, and 2) the -\ from OOD partition is almost 0%, thus neither the random nor OOD alone is suitable for statistical evaluation of 3 distinct classes. To overcome this issue, we combine the random and OOD partitioning methods to ensure that the size of -\ , -] , and -^ is considerably balanced.
Third, the 3 datasets were directly used in the experiments of GE. In RE, we additionally added perturbations to the training sets of the created 3 datasets and remained the test sets unchanged.

Generalization evaluation experiments:
We train and test the model on the 3 produced datasets.
For each dataset, we show model performance on the overall dataset and on 3 distinct classes. For each experiment, we run 5 seeds and report the average F1 score.

Robustness evaluation experiments:
We follow the experimental protocol for robustness evaluation in the manuscript. As for the 3 produced datasets, we add random perturbations (perturbation ratio is 0.2) on their training sets and retain their test sets unmodified. For each dataset, the model's performance is displayed in overall fashion and respective fashion on 3 distinct classes.
For each experiment, we run 5 seeds, and then report the average F1 score.

Results and discussions:
Dataset  Table R7. Robustness experiments. We show the averaged F1 scores of our model and a strong baseline (GNN-PPI) across 5 seeds. 'Bl' and 'Ours' represent the solid baseline method and HIGH-PPI, respectively. Avg. gain shows the performance gain between our method and the baseline method.

Figure R5. Bar chart visualization for the GE and RE experiments.
We present the following core and secondary findings for above generalization evaluation (GE) and robustness evaluation (RE).
having little effect on performance on the respective classes. Thus, it seems that the proportion of the three test pair classes (supplementary Table 6) as well as the percentage of unknown proteins (Fig. 2c) in the test sets may both have a significant role in determining the degree of OOD in the dataset." Please explain how you defined this threshold value. Is this value (*10 Å) a reasonable distance?
Thanks for your constructive comment. We concur that the cutoff distance (threshold) is worth of discussion. Our explanation in this case is based on understanding of biophysics and machine learning. For ease of use, we abbreviate the cutoff distance as 8 k .
In terms of deep graph learning, the major goal of protein representation with graph structure data is to define protein features linked to 3D structure in a memory and computation efficient manner. The inter-residue forces are necessary for protein structural stability, which in turn defines the protein properties, should normally be maintained by the selected 8 k [R13, R14]. The ideal 8 k , however, is inconclusive and depends on variables including a wet experimental environment and protein family differences, according to the majority of prior research [R15]. Pioneer work selects 8 k = 8 Å (i.e., residues only within 8 Å can be connected) as the ideal range for residue-residue interactions (RRIs). This range can be used to characterize a variety of protein properties, including their hydrophobic behavior [R16, R17], folding mode [R18, R19, R20], stability upon mutations [R21], and thermal stability [R22]. Nevertheless, some studies have found that closer distances [R23] or longer distances [R24] are appropriate. In contrary, a bigger 8 k (more than 8) reduces the neglect of RRIs, allowing the graph data to capture more protein features. For instance, 8 k = 8 Å is adequate to reflect protein hydrophobic behaviors but falls short of capturing other traits including protein geometric properties [R24]. But 8 k cannot be endlessly big since joining weakly interacting residues results in erroneous predictions of the features of proteins.
Based on the literature research and logical consideration, one hypothesis is that the optimal 8 k can vary in different models and as well as in relevant datasets, thus 8 k needs to be identified case by case. Therefore, we chose the 8 k = 10 Å based on an empirical study ( Figure R6). In order to choose a suitable 8 k , we generated 11 sets of protein adjacency with a parameter sweep of 8 k between 5.0 and 15.0 at 1.0 intervals. We ran 3 seeds for each 8 k and calculate the average results as F1 scores. This empirical study reveals that an optimal cutoff distance (8 k ) is around *10 Å in our model, also the model's performance grows quickly at 8 k = *7 Å and declines at 8 k = *13 Å. Even though we selected 8 k = 10 Å, it is important to note the incredibly narrow range of F1 values. The optimal cutoff distance (8 k ) can be selected from an aggregation between 9-12 Å because the overall F1 scores vary from 0.8621 to 0.8633 for the 8 k between 9-12 Å, which also agrees with the literature observations in the paragraph above.